Search for: All records

Creators/Authors contains: "Xue, Z"

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Vid2Coach: Transforming How-To Videos into Task Assistants

Huh, M; Xue, Z; Das, U; Ashutosh, K; Grauman, K; Pavel, A (July 2025, https://doi.org/10.48550/arXiv.2506.00717)

People use videos to learn new recipes, exercises, and crafts. Such videos remain difficult for blind and low vision (BLV) people to follow as they rely on visual comparison. Our observations of visual rehabilitation therapists (VRTs) guiding BLV people to follow how-to videos revealed that VRTs provide both proactive and responsive support including detailed descriptions, non-visual workarounds, and progress feedback. We propose Vid2Coach, a system that transforms how-to videos into wearable camera-based assistants that provide accessible instructions and mixed-initiative feedback. From the video, Vid2Coach generates accessible instructions by augmenting narrated instructions with demonstration details and completion criteria for each step. It then uses retrieval-augmented-generation to extract relevant non-visual workarounds from BLV-specific resources. Vid2Coach then monitors user progress with a camera embedded in commercial smart glasses to provide context-aware instructions, proactive feedback, and answers to user questions. BLV participants (N=8) using Vid2Coach completed cooking tasks with 58.5\% fewer errors than when using their typical workflow and wanted to use Vid2Coach in their daily lives. Vid2Coach demonstrates an opportunity for AI visual assistance that strengthens rather than replaces non-visual expertise.
more » « less
Full Text Available
Viewpoint Rosetta Stone: Unlocking Unpaired Ego-Exo Videos for View-invariant Representation Learning

Luo, M; Xue, Z; Dimakis, A; Grauman, K (June 2025, CVPR 2025)

Egocentric and exocentric perspectives of human action differ significantly, yet overcoming this extreme viewpoint gap is critical in augmented reality and robotics. We propose VIEWPOINTROSETTA, an approach that unlocks large-scale unpaired ego and exo video data to learn clip-level viewpoint-invariant video representations. Our framework introduces (1) a diffusion-based Rosetta Stone Translator (RST), which, leveraging a moderate amount of synchronized multi-view videos, serves as a translator in feature space to decipher the alignment between unpaired ego and exo data, and (2) a dual encoder that aligns unpaired data representations through contrastive learning with RST-based synthetic feature augmentation and soft alignment. To evaluate the learned features in a standardized setting, we construct a new cross-view benchmark using Ego-Exo4D, covering cross-view retrieval, action recognition, and skill assessment tasks. Our framework demonstrates superior cross-view understanding compared to previous view-invariant learning and ego video representation learning approaches, and opens the door to bringing vast amounts of traditional third-person video to bear on the more nascent first-person setting.
more » « less
Full Text Available
Progress-Aware Video Frame Captioning

Xue, Z; An, J; Yang, X; Grauman, K (March 2025, https://doi.org/10.48550/arXiv.2412.02071)

While image captioning provides isolated descriptions for individual images, and video captioning offers one single narrative for an entire video clip, our work explores an important middle ground: progress-aware video captioning at the frame level. This novel task aims to generate temporally fine-grained captions that not only accurately describe each frame but also capture the subtle progression of actions throughout a video sequence. Despite the strong capabilities of existing leading vision language models, they often struggle to discern the nuances of frame-wise differences. To address this, we propose ProgressCaptioner, a captioning model designed to capture the fine-grained temporal dynamics within an action sequence. Alongside, we develop the FrameCap dataset to support training and the FrameCapEval benchmark to assess caption quality. The results demonstrate that ProgressCaptioner significantly surpasses leading captioning models, producing precise captions that accurately capture action progression and set a new standard for temporal precision in video captioning. Finally, we showcase practical applications of our approach, specifically in aiding keyframe selection and advancing video understanding, highlighting its broad utility.
more » « less
Full Text Available
SPOC: Spatially-Progressing Object State Change Segmentation in Video

Mandikal, P; Nagarajan, T; Stoken, A; Xue, Z; Grauman, K (March 2025, https://doi.org/10.48550/arXiv.2503.11953)

Object state changes in video reveal critical information about human and agent activity. However, existing methods are limited to temporal localization of when the object is in its initial state (e.g., the unchopped avocado) versus when it has completed a state change (e.g., the chopped avocado), which limits applicability for any task requiring detailed information about the progress of the actions and its spatial localization. We propose to deepen the problem by introducing the spatially-progressing object state change segmentation task. The goal is to segment at the pixel-level those regions of an object that are actionable and those that are transformed. We introduce the first model to address this task, designing a VLM-based pseudo-labeling approach, state-change dynamics constraints, and a novel WhereToChange benchmark built on in-the-wild Internet videos. Experiments on two datasets validate both the challenge of the new task as well as the promise of our model for localizing exactly where and how fast objects are changing in video. We further demonstrate useful implications for tracking activity progress to benefit robotic agents.
more » « less
Full Text Available
Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos

Luo, M; Xue, Z; Dimakis, A; Grauman, K (March 2024, https://doi.org/10.48550/arXiv.2403.06351)

We investigate exocentric-to-egocentric cross-view translation, which aims to generate a first-person (egocentric) view of an actor based on a video recording that captures the actor from a third-person (exocentric) perspective. To this end, we propose a generative framework called Exo2Ego that decouples the translation process into two stages: high-level structure transformation, which explicitly encourages cross-view correspondence between exocentric and egocentric views, and a diffusion-based pixel-level hallucination, which incorporates a hand layout prior to enhance the fidelity of the generated egocentric view. To pave the way for future advancements in this field, we curate a comprehensive exo-to-ego cross-view translation benchmark. It consists of a diverse collection of synchronized ego-exo tabletop activity video pairs sourced from three public datasets: H2O, Aria Pilot, and Assembly101. The experimental results validate that Exo2Ego delivers photorealistic video results with clear hand manipulation details and outperforms several baselines in terms of both synthesis quality and generalization ability to new actions.
more » « less
Full Text Available
Modeled Coastal‐Ocean Pathways of Land‐Sourced Contaminants in the Aftermath of Hurricane Florence

https://doi.org/10.1029/2023JC019685

Moulton, Melissa; Zambon, Joseph B; Xue, Z George; Warner, John C; Bao, Daoyang; Yin, Dongxiao; Defne, Zafer; He, Ruoying; Hegermiller, Christie (March 2024, Journal of Geophysical Research: Oceans)

Abstract Extreme precipitation during Hurricane Florence, which made landfall in North Carolina in September 2018, led to breaches of hog waste lagoons, coal ash pits, and wastewater facilities. In the weeks following the storm, freshwater discharge carried pollutants, sediment, organic matter, and debris to the coastal ocean, contributing to beach closures, algae blooms, hypoxia, and other ecosystem impacts. Here, the ocean pathways of land‐sourced contaminants following Hurricane Florence are investigated using the Regional Ocean Modeling System (ROMS) with a river point source with fixed water properties from a hydrologic model (WRF‐Hydro) of the Cape Fear River Basin, North Carolina's largest watershed. Patterns of contaminant transport in the coastal ocean are quantified with a finite duration tracer release based on observed flooding of agricultural and industrial facilities. A suite of synthetic events also was simulated to investigate the sensitivity of the river plume transport pathways to river discharge and wind direction. The simulated Hurricane Florence discharge event led to westward (downcoast) transport of contaminants in a coastal current, along with intermittent storage and release of material in an offshore (bulge) or eastward (upcoast) region near the river mouth, modulated by alternating upwelling and downwelling winds. The river plume patterns led to a delayed onset and long duration of contaminants affecting beaches 100 km to the west, days to weeks after the storm. Maps of the onset and duration of hypothetical water quality hazards for a range of weather conditions may provide guidance to managers on the timing of swimming/shellfishing advisories and water quality sampling.
more » « less
Full Text Available
Artificial Space Weathering to Mimic Solar Wind Enhances the Toxicity of Lunar Dust Simulants in Human Lung Cells

https://doi.org/10.1029/2023GH000840

Chang, J_H M; Xue, Z; Bauer, J; Wehle, B; Hendrix, D A; Catalano, T; Hurowitz, J A; Nekvasil, H; Demple, B (February 2024, GeoHealth)

Abstract During NASA's Apollo missions, inhalation of dust particles from lunar regolith was identified as a potential occupational hazard for astronauts. These fine particles adhered tightly to spacesuits and were unavoidably brought into the living areas of the spacecraft. Apollo astronauts reported that exposure to the dust caused intense respiratory and ocular irritation. This problem is a potential challenge for the Artemis Program, which aims to return humans to the Moon for extended stays in this decade. Since lunar dust is “weathered” by space radiation, solar wind, and the incessant bombardment of micrometeorites, we investigated whether treatment of lunar regolith simulants to mimic space weathering enhanced their toxicity. Two such simulants were employed in this research, Lunar Mare Simulant‐1 (LMS‐1), and Lunar Highlands Simulant‐1 (LHS‐1), which were added to cultures of human lung epithelial cells (A549) to simulate lung exposure to the dusts. In addition to pulverization, previously shown to increase dust toxicity sharply, the simulants were exposed to hydrogen gas at high temperature as a proxy for solar wind exposure. This treatment further increased the toxicity of both simulants, as measured by the disruption of mitochondrial function, and damage to DNA both in mitochondria and in the nucleus. By testing the effects of supplementing the cells with an antioxidant (N‐acetylcysteine), we showed that a substantial component of this toxicity arises from free radicals. It remains to be determined to what extent the radicals arise from the dust itself, as opposed to their active generation by inflammatory processes in the treated cells.
more » « less
Full Text Available
A Numerical reassessment of the Gulf of Mexico carbon system in connection with the Mississippi River and global ocean

https://doi.org/10.5194/bg-19-4589-2022

Zhang, Le; Xue, Z. George (January 2022, Biogeosciences)

Abstract. Coupled physical–biogeochemical models can fill thespatial and temporal gap in ocean carbon observations. Challenges ofapplying a coupled physical–biogeochemical model in the regional oceaninclude the reasonable prescription of carbon model boundary conditions,lack of in situ observations, and the oversimplification of certainbiogeochemical processes. In this study, we applied a coupledphysical–biogeochemical model (Regional Ocean Modelling System, ROMS) to theGulf of Mexico (GoM) and achieved an unprecedented 20-year high-resolution(5 km, 1/22∘) hindcast covering the period of 2000 to 2019. Thebiogeochemical model incorporated the dynamics of dissolved organic carbon(DOC) pools and the formation and dissolution of carbonate minerals. Thebiogeochemical boundaries were interpolated from NCAR's CESM2-WACCM-FV2solution after evaluating the performance of 17 GCMs in the GoM waters. Modeloutputs included carbon system variables of wide interest, such aspCO2, pH, aragonite saturation state (ΩArag), calcitesaturation state (ΩCalc), CO2 air–sea flux, and carbon burialrate. The model's robustness is evaluated via extensive model–datacomparison against buoys, remote-sensing-based machine learning (ML)products, and ship-based measurements. A reassessment of air–sea CO2flux with previous modeling and observational studies gives us confidencethat our model provides a robust and updated CO2 flux estimation, andNGoM is a stronger carbon sink than previously reported. Model resultsreveal that the GoM water has been experiencing a ∼ 0.0016 yr−1 decrease in surface pH over the past 2 decades, accompanied by a∼ 1.66 µatm yr−1 increase in sea surfacepCO2. The air–sea CO2 exchange estimation confirms in accordance with severalprevious models and ocean surface pCO2 observations that theriver-dominated northern GoM (NGoM) is a substantial carbon sink, and theopen GoM is a carbon source during summer and a carbon sink for the rest ofthe year. Sensitivity experiments are conducted to evaluate the impacts ofriver inputs and the global ocean via model boundaries. The NGoM carbonsystem is directly modified by the enormous carbon inputs (∼ 15.5 Tg C yr−1 DIC and ∼ 2.3 Tg C yr−1 DOC) from theMississippi–Atchafalaya River System (MARS). Additionally,nutrient-stimulated biological activities create a ∼ 105 timeshigher particulate organic matter burial rate in NGoM sediment than in thecase without river-delivered nutrients. The carbon system condition of theopen ocean is driven by inputs from the Caribbean Sea via the Yucatan Channeland is affected more by thermal effects than biological factors.
more » « less
Full Text Available
Temperature Across Vegetation Canopy-Water-Soil Interfaces Is Modulated by Hydroperiod and Extreme Weather in Coastal Wetlands

https://doi.org/10.3389/fmars.2022.852901

Zhao, Xiaochen; Rivera-Monroy, Victor H.; Li, Chunyan; Vargas-Lopez, Ivan A.; Rohli, Robert V.; Xue, Z. George; Castañeda-Moya, Edward; Coronado-Molina, Carlos (May 2022, Frontiers in Marine Science)

Environmental temperature is a widely used variable to describe weather and climate conditions. The use of temperature anomalies to identify variations in climate and weather systems makes temperature a key variable to evaluate not only climate variability but also shifts in ecosystem structural and functional properties. In contrast to terrestrial ecosystems, the assessment of regional temperature anomalies in coastal wetlands is more complex since the local temperature is modulated by hydrology and weather. Thus, it is unknown how the regional free-air temperature (T Free ) is coupled to local temperature anomalies, which can vary across interfaces among vegetation canopy, water, and soil that modify the wetland microclimate regime. Here, we investigated the temperature differences (offsets) at those three interfaces in mangrove-saltmarsh ecotones in coastal Louisiana and South Florida in the northern Gulf of Mexico (2017–2019). We found that the canopy offset (range: 0.2–1.6°C) between T Free and below-canopy temperature (T Canopy ) was caused by the canopy buffering effect. The similar offset values in both Louisiana and Florida underscore the role of vegetation in regulating near-ground energy fluxes. Overall, the inundation depth did not influence soil temperature (T Soil ). The interaction between frequency and duration of inundation, however, significantly modulated T Soil given the presence of water on the wetland soil surface, thus attenuating any short- or long-term changes in the T Canopy and T Free . Extreme weather events—including cold fronts and tropical cyclones—induced high defoliation and weakened canopy buffering, resulting in long-term changes in canopy or soil offsets. These results highlight the need to measure simultaneously the interaction between ecological and climatic processes to reduce uncertainty when modeling macro- and microclimate in coastal areas under a changing climate, especially given the current local temperature anomalies data scarcity. This work advances the coupling of Earth system models to climate models to forecast regional and global climate change and variability along coastal areas.
more » « less
Full Text Available
Dissolved Inorganic Carbon Transport in the Surface‐Mixed Layer of the Louisiana Shelf in Northern Gulf of Mexico

https://doi.org/10.1029/2020JC016605

Anderson, M M; Maiti, K; Xue, Z George; Ou, Y (November 2020, Journal of Geophysical Research: Oceans)

Abstract Rivers and wetlands are a major source of terrestrial derived carbon for coastal ocean margins. This results in a net loss of terrestrial carbon into the shelf water and their subsequent transport to interior ocean basin. This study investigates the transport of dissolved inorganic carbon (DIC) in the surface‐mixed layer of Louisiana Shelf in northern Gulf of Mexico (nGOM) adjacent to the Wax Lake Delta (WLD) and Barataria Bay (BB), which represent contrasting net land gain and net land loss areas in this region. DIC samples were collected, in conjunction with short‐lived radium isotopes²²⁴Ra (t_1/2 = 3.66 days) and²²³Ra (t_1/2 = 11.43 days) samples during June and September 2019, to quantify shelf transport of DIC in the surface‐mixed layer during period of high and low river flow, respectively. Radium distribution implied shelf mixing rates of 140–6,759 and 63–2,724 m² s⁻¹for WLD and BB regions, respectively, with more than tenfold decrease in rates between the two seasons. Net shelf transport of DIC was found to be highest for the WLD region in June, highlighting the importance of freshwater discharge in exporting DIC. An upscaling of our study for the entire Louisiana Shelf indicates that 1.54–20.19 × 10⁹ mol C d⁻¹transported in June 2019 and 0.34–8.12 × 10⁹ mol C d⁻¹in the form of DIC was exported across the shallow region of the shelf during high and low river flow seasons, representing an important source of DIC to the NGOM.
more » « less
Full Text Available

« Prev Next »